-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Gitlab inclusion globbing #3500
Conversation
Included test.
…o to support globbing. Apply globbing filter when repos is not provided.
remove test to check errors if globs are invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a pretty faithful reproduction of the behavior of globbing in the GitHub source, which is smart: it's great for consistency and it reduces the chance of weird errors creeping in. I have some suggestions to improve, but they're just suggestions, especially considering the GitHub source behavior here:
- Defer to @rosecodym here, but what do you think about glob compilation errors becoming error conditions? My thinking here is scanning something you don't want to scan, or not scanning something you definitely wanted to scan is probably something you want to address upfront, so canceling the scan might be more desirable.
- You could note that
exclude
takes precedence overinclude
in the CLI flags usage info - Not 100% sure about this, but you could also make the condition where the configuration of both
include
andexclude
is such that no repos are included an error condition. The thing this would avoid is having a user typo a glob, get a ✅ from trufflehog, and think everything is peachy when they inadvertently didn't scan anything.
If you do decide to do any of these things (and to be clear IMO it's totally up to you), there's the Validator interface where you could do checking of the glob itself and any effects of the globbing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this! Copying the GitHub implementation is a great idea, but that implementation is more complicated than what GitLab needs - in particular, GitLab doesn't need repo caching, as far as I can tell. I've left some comments inline with more details, and feel free to reach out to me directly with questions.
pkg/sources/gitlab/gitlab.go
Outdated
// filteredRepoCache is a wrapper around cache.Cache that filters out repos | ||
// based on include and exclude globs. | ||
type filteredRepoCache struct { | ||
cache.Cache[string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like the cache part of filteredRepoCache
is ever used, so I don't think this is really a cache! That's a good thing - the repo caching in the GitHub source is pretty complicated, and it's good that we don't have to do it here too. You can remove the embedded cache.Cache
and rename this structure - or even eliminate it entirely and pass the include/exclude globs directly to buildIgnorer
(which would mean making buildIgnorer
a little more complicated, but would also mean that you don't have to make anything else more complicated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosecodym the idea is to compile it very first time and just reuse it, similar to Github. Moving compilation in buildignorer will compile globs for all kind of Projects (user projects, group projects etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I do agree that cache.Cache
is not needed in Gitlab source. I will take a look on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosecodym I have removed the concept of caching and renamed the entity 'globRepoFilter'. I did not eliminate the whole structure as I believe having that is just making the code more clear. Let me know what are your thoughts on this.
pkg/sources/gitlab/gitlab.go
Outdated
for _, ig := range include { | ||
g, err := glob.Compile(ig) | ||
if err != nil { | ||
ctx.Logger().V(1).Info("invalid include glob", "include_value", ig, "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I definitely think that glob compilation errors should be a fatal error. At the very least we should log it with Error
instead of Info
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted the code back to support compilation code. infact, now compilation will be done at validation
, enumeration
or chunk
func like it was done before.
modified changes to support glob compilation errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks! i added a note about an unused parameter and a question about some comments, but nothing blocking. can you also confirm my reading that this pr exposes the repo exclusion globbing that was always present in the engine but was not previously exposed on the command line?
pkg/sources/gitlab/gitlab.go
Outdated
include, exclude []glob.Glob | ||
} | ||
|
||
func newGlobRepoFilter(ctx context.Context, include, exclude []string, onCompileErr func(err error, pattern string)) *globRepoFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't look like ctx
is used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for highlighting. Right, Its not being used. Let me clean it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@@ -266,6 +266,10 @@ type GitlabConfig struct { | |||
Filter *common.Filter | |||
// SkipBinaries allows skipping binary files from the scan. | |||
SkipBinaries bool | |||
// IncludeRepos is a list of repositories to include in the scan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a list of repositories or repository glob patterns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List of repository glob patterns.
Correct. I could not see a way to set |
Description:
As on the customer demands, similar to github source, This PR implements inclusion of globs for Gitlab sources. Two new flags are introduce similar to Github,
--include-repos
and--exclude-repos
respectively. Since, glob patterns are validated and ignore if not compiled, therefore, test to check those errors is removed.Gitlab
Soruce test results:--help
results forGitlab
source:Checklist:
make test-community
)?make lint
this requires golangci-lint)?